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Abstract 

Graphical manipulation of human figures is essen- 
tial for certain types of human factors analyses such 
as reach, clearance, fit, and view. In many sit- 
uations, however, the animation of simulated peo- 
ple performing various tasks may be based on more 
complicated functions involving multiple simultane- 
ous reaches, critical timing, resource availability, and 
human performance capabilities. One rather effective 
means for creating such a simulation is through a nat- 
ural language description of the tasks to be carried 
out. Given an anthropometrically-sized figure and 
a geometric workplace environment, various simple 
actions such as reach, turn, and view can be effec- 
tively controlled from language commands or stan- 
dard NASA checklist procedures. The commands 
mav also be generated by external simulation tools. 
Task timing is determined from actual performance 
models, if available, such as strength models or Fitts’ 
Law. The resulting action specifications are animated 
on a Silicon Graphics Iris workstation in real-time. 

1 Introduction 

Simple computer animation is not so simple anymore. 
What was once acknowledged as a “good animation 
is no longer acceptable. Animations are not neces- 
sarily things which are “looked at” for aesthetic pur- 
poses but are being used for practical applications 
in science and engineering analyses. Human figure 
animation, in particular, is receiving considerable at- 
tention as new display systems and robust animation 
software bring motion control and rendering capabil- 
ities to a widening range of users. Animations are 


created to evaluate the ability of people to fit or work 
in designed environments, determine whether work 
places satisfy their functional requirements, and an- 
alyze human task performance in a given situation. 
With the expanded role of animation and increased 
viewer sophistication, the tools for developing anima- 
tions for these analytic purposes have become consid- 
erably more complex. 

To gain control over complexity, animation tools are 
becoming “task oriented.” A system which allows a 
process to be described at a level best suited for the 
action allows the user to specify the action in the least 
restrictive, and most natural, manner [4, 23]. This 
important benefit becomes crucial as the animation 
tools shift out of the animation production houses and 
into other industries and laboratoriesj human factors 
engineers often lack the manual and artistic skills nec- 
essary for the specification of animation. 

The solution to this problem is two-fold. New users 
must be educated, but also, the vocabulary recog- 
nized by the tools must be modified. Certainly, the 
obvious conclusion is that the tools must understand 
a “task level” vocabulary. Even with that higher level 
of understanding, communication would still be lim- 
ited as the user not only lacks the vocabulary, but 
also the language for communication. 

The ideal language for communication is one with 
which the user is most comfortable. Natural language 
parsers, however, are complex programs [3]. Further- 
more, integrating such a program into the animation 
environment introduces several interfacing problems 

[5]. 

We shall describe here a prototype system in which 
task animation is driven via natural language. We 
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focus on the interface between the natural language 
parser and the motion generator. The paper is orga- 
nized as follows. Section 2 discusses how we currently 
limit the scope of the problem and describes the do- 
main in which our animations are created. Section 3 
describes relevant research. Section 4 discusses how 
the parser and motion generator are integrated. Sec- 
tion 5 describes the technique which is used to fill in 
the timing information tacitly embedded in the nat- 
ural language commands. 


2 Problem Domain 

Since our goal is to investigate the linkage between 
language and task animation, initially the task do- 
main is limited to ‘‘simple” reaches and view changes. 
(Karlin [17] investigated more complex motions; these 
will be added to the system vocabulary later.) A 
“simple” reach is one which requires no locomotion, 
only movement of the arm or upper body. A view 
change is a change in the orientation of a figure’s head 
(i.e. the figure’s view of the world changes). While 
seemingly very easy, these tasks already demon- 
strate much of the essential complexity underlying 
language-based animation control. 

2.1 Task Environment 

The tasks to be performed and animated all center 
around a control panel (i.e. a finite region of more or 
less rigidly fixed manually-controllable objects). By 
using a control panel, it is obvious that many ev- 
eryday tasks can be simulated. Some control panels 
encountered in a normal day-to-day routine are type- 
writer keyboards, elevator panels, light switches, and 
car dashboards. We will use as a generic example the 
remote manipulator system control panel in the space 
shuttle (Figure 1 ) as it contains a variety of controls 
and indicators. 

The purpose of creating the task animation is for task 
performance analysis. In particular, we want to de- 
termine if some person, X , can perform a task, and 
if so, we want to view the task performance. How- 
ever, task performance depends on who is executing 
the task. If A has short arms, then he might not 
able to reach the control panel. Therefore, included 
in our task environment is the ability to specify the 
anthropometric “sizing’ 1 of the people to be included 
[15]. The size is based on a percentage of some pop- 
ulation data (e.g., NASA crew member trainees [1]). 
For example, a 50%-ile man represents the average 


man in some body of data, whereas the 95%-ile man 
represents a man whose size parameters are in the 
95 </l percentile. Similar data should exist for women 
over some population. Figure 2 shows 50* A and 95^ 
percentile men and women based upon available data 
[ 21 ]. 


3 Relevant Research 

Zeltzer [26] first gave names to the various “levels” 
of computer animation: “guiding level,” “production 
level,’ and “task level.” Using his nomenclature, the 
type of system we describe here is a “task level” sys- 
tem. His system for controlling the walk of human 
figure [25] is a specialized system for a particular task 
to be performed (i.e., walking). For now, our “skills” 
consist of reaching and viewing. 

The Story Driven Animation System [22] accepts 
modified natural language input and creates the cor- 
responding animation. The emphasis in this work is 
on story understanding and the ability to choose the 
correct key frames. Similar high level (intelligent) 
selection among existing key frames is also demon- 
strated by Fishwick [11, 10] 

MIRALOGIC [19] is an interesting approach to em- 
bedding a high-level of understanding within an an- 
imation system. Through the use of this expert sys- 
tem, the user can specify rules for setting up an envi- 
ronment and the system will identify inconsistencies 
or potential problems and suggest possible solutions. 

AS AS [20], and the other object-oriented systems it 
exemplifies [19], can also implement task-level seman- 
tics through task decomposition. A task can be de- 
composed procedurally. 

These systems all address a different type of prob- 
lem than that which is being addressed here. The 
tasks in our system are specified in natural (or any 
syntactically-described artificial) language with the 
purpose of examining task performance. As such, it 
is easy to change the tasks as well as the anthropo- 
metric parameters describing the performers. 

4 Integrating Language 
and Motion Generation 

The primary focus of this work is to examine how 
natural language task specification and animation can 
be combined in an application-independent manner. 
The burden of this requirement falls upon the link 
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Figure 1: Space Shuttle Remote Manipulator System Control Panel 



(a) 50**% man 



(b) 50**% woman 



(c) 95**% man 



Figure 2: Anthropomorphicaliy Valid Articulated Figures 
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between these two environments. To illustrate the 
situation, we will discuss a sample natural language 
script actually used to create an animation: 


J 

is a 

50 percent man. 

s 

is a 

50 percent woman. 

J. 

look 

at switch twf-1. 

J. 

turn 

twf-1 to state 4 

S, 

look 

at tglJ-1. 

J. 

look 

at twf-2. 

S, 

turn 

tglJ-1 on. 

S. 

look 

at twf-3 . 

S. 

turn 

twf-3 to state 1 

J. 

look 

at twf-3. 

J. 

look 

at S . 

S. 

look 

at J . 


This type of script is common in performing checklist 
procedures such as those done in airplanes or space 
shuttles [2]. The verb “look at” represents a view 
change and the verb “turn” involves a simple reach. 
(The parser accepts a larger variety of syntactic con- 
structions than illustrated by this example [5].) 

The two primary problems are specifying reach and 
view goals, and connecting object references to their 
geometric instances. 

4.1 Specifying Goals 

A goal for a reach task is the point which the hand 
should touch. For this particular type of task, such 
a goal has three positional degrees of freedom, al- 
though there are situations in which rotational de- 
grees of freedom may be considered as well. A view 
goal is a point in space toward which one axis of an 
object must be pointed. 

Within an animation environment, such goals repre- 
sent points in space (for position goals) or coordinate 
reference frames (for position and rotation goals) ul- 
timately specified numerically with respect to a coor- 
dinate system. Within the nat ural language environ- 
ment, the goals are not coordinates, but rather are 
represented by objects as in, for example, the com- 
mands: 

J, look at switch twF-1. 

S, turn switch tglJ-1 on. 

The information regarding the exact locations of the 
switches is basically unimportant at the language 
level. Somehow, the switch name tglJ-1 must be 
mapped to the appropriate switch on the panel in the 
animation environment. The same process must be 


followed for the target object toward which an object 
axis must be aligned in a view change. This problem 
reduces to one of object referencing. 

4.2 Object Referencing 

In general, all objects have names. Although the 
names may be different in the animation and language 
environments, providing a map between the names is 
not difficult. This, of course, assumes there is a one- 
to-one correspondence among the names. Such a re- 
quirement, however, defeats the goal of independence 
between the environments. 

The problem domain specifically includes control pan- 
els. From a task specification perspective, a control 
panel is a very complex object consisting of many fea- 
tures such as controls, indicators, etc. From a com- 
puter graphics perspective, the most salient feature 
of the control panel is its appearance, not necessar- 
ily the detailed geometry of the individual switches. 
An object such as a control panel can most efficiently 
be represented as a single textured object which can 
then be mapped onto a polygon. The alternative of 
representing each individual switch would require a 
large number of polygons and an extensive amount of 
digitizing work to obtain a visually adequate repre- 
sentation of the switches. 

By allowing each environment to represent the panel 
in a manner that is best suited for the way in which 
it will be referenced, the one-to-one correspondence 
among names is lost. The many objects in the task 
specification environment all correspond to a single 
texture mapped panel. A method is needed which will 
allow the construction of a mapping of feature names 
in the task specification environment to texture map 
locations in the animation environment. 

We used a paint program as the basis for such a tool. 
Since a paint program allows one to create the texture 
maps in image space, additional input was required 
to specify the polygon on which the image is to be 
mapped. With that information, important locations 
on the texture map could be identified and given at- 
tributes (e.g., switch or indicator, rotary control or 
push button, etc.), and the corresponding locations 
on the polygon were calculated. The output of this 
tool provided input to both the semantic knowledge 
base and the geometric database. 

4.2.1 The Knowledge Base 

The knowledge base needs to contain information 
about object names and hierarchies, but need not 
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be concerned with actual geometry or location. Fur- 
thermore, as the task specifications and object defini- 
tions become more complex, the knowledge base can 
contain causality relationships. For example, turning 
switch tglJ-1 to on may cause some other object to 
move or change state [5]. We use a frame-like knowl- 
edge base called DC-RL to store semantic information 
[ 81 - 

Object information must be entered into the knowl- 
edge base manually, as it can differ for each con- 
trol panel, but the name mapping program described 
above can be used to specify the linkages into the 
animation environment. 

For example, here is a section of an actual map file. 

{ concept ctrlpanel from panelf ig 
having ( 

[role twF-1 with 

[ value = ctrlpanel -panel .twf _1 ]] 

[role twF-2 with 

[ value = ctrlpanel .panel. twf _2 ]] 

[role twF-3 with 

[ value = ctrlpanel .panel .twf _3 ]] 

[role tglJ-1 with 

[ value = ctrlpanel .panel .tglj _1 ]] 
[role tglJ-2 with 

[ value = ctrlpanel .panel .tglj _2 ]] 

) 

} 

The names twF-1, twF-2, tglJ-1 correspond to 
the names of switches manually created in the 
existing knowledge base panel description called 
panelf ig. These names are mapped to the corre- 
sponding names in the animation environment (e.g.> 
ctrlpanel .panel . twf >1, etc.) and are guaranteed 
to match as the actual object within the animation 
environment is automatically generated. 


4.2.2 The Geometric Database 

The geometric database is called the Peabody Envi- 
ronment Network (or just peabody). In peabody, a 
figure is composed of a set of segments , each of which 
may have geometry associated with it. The geom- 
etry within each segment is defined within its own 
local coordinate system. Joints connect segments at 
attachment points called sites. A joint is actually a 
transformation between sites and hence sites have an 
orientation as well as a location. Segments can have 
anv number of sites and it is through those sites that 
the different interesting points on the texture map are 
identified for the animation environment. 


The relevant part of the peabody description of the 
panel figure is shown: 

figure ctrlpanel { 
segment panel { 

psurf = "panel .pss"; 
site base->locat ion = 

trans (0 . 00cm , 0 . 00cm , 0 .00cm) ; 
site twf _l->location = 

trans(13. 25cm, 163. 02cm, 80. 86cm) ; 
site twf _2->location = 

trans(64 . 78cm ,115. 87cm , 95 . 00cm) ; 
site twf _3->location = 

trans (52 . 84cm , 1 29 . 09cra , 9 1 .43cm) ; 
site tglj _l->locat ion = 

trans (72 . 36 cm , 158 . 7 7 cm, 8 1 .46cm) ; 
site tglj _2->locat ion = 

trans (9. 15cm, 115. 93cm , 94 . 98cm) ; 

> 

} 

This entire file is automatically generated based upon 
the map file. Since the panel is a rigid object with no 
movable parts, no joints are required. The location of 
each site (each of which represents a different switch) 
was calculated in the paint program (which created 
the file) by applying the texture mapping transforma- 
tions normally applied when the image is rendered. 


4.3 Creating an Animation 

Mapping objects from the task description environ- 
ment to the animation environment provides one of 
the crucial links needed for creating an animation. 
The language processor provides another link. Our 
Mot ion- Verb Parser (MVP) [5] uses both a subset of 
natural language and an artificial language (NASA 
checklists) for its syntax. Information obtained dur- 
ing the parse is stored in the semantic knowledge base 
DC-RL. The natural language task descriptions that 
are included in the problem domain are such that a 
single animation key frame can be developed from a 
single command. Each part of speech fills in slots in 
an animation command template. 

Figure 3 shows the relationship between the task 
specification and the animation commands. A “turn” 
command specifies a reach which can be solved using 
inverse kinematics; a “look at” command specifies an 
orientation change which can also be solved using in- 
verse kinematics [6, 14]. Frames from an animation 
created using the script shown in Section 4 are shown 
in Figure 4. 
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J look at switch twf-1. 
J turn twf-1 to state 4. 
S look at tglJ-1. 

S turn tglJ-1 on. 


point_at(”ctrlpanel. panel. twf.l”,”J. bottom-head. between.eyes” ,(1,0,0)); 

reach-site( ctrlpanel. panel. twf-1” ,” J . right-hand. fingers-distal” ); 
point_at( ctr lpanel. panel. twj_l” S. bottom-head. between-e yes” ,( 1 ,0,0)); 
reach-site( ,5 ctrlpanel.panel.twj-r 5 , ”S. left Ji and. fingers.distal”); 


Figure 3. Natural Language Input and Animation Commands 


5 Default Timing Constructs 

Given that the basic key frames can be generated 
based upon a natural language task description, cre- 
ating the overall animation can still be somewhat dif- 
ficult. Techniques for creating motion by animating 
the solution algorithm such as those done by Badler, 
Manoochehri and Walters [6], Witkin, Fleisher and 
Barr [24], or Barzel and Barr [7] are themselves inap- 
propriate for task performance analysis. Instead, the 
positions created must be taken for what they are: 
the desired configuration of the body at a particular 
time. The exact time, however, is either unknown, 
unspecified, or arbitrary. 

The timing of actions could be explicitly specified in 
the input, but (language-based) task descriptions do 
not normally indicate time. Alternatively, defining 
the time at which actions occur can be arbitrarily 
decided and a reasonable task animation can be pro- 
duced. In fact, much animator effort is normally re- 
quired to temporally position key postures. There 
are, however, more reasonable ways of formulating a 
guess for possible task duration. 

Several factors effect task performance times, for ex- 
ample: level of expertise, desire to perform the task, 
degree of fatigue (mental and physical), distance to 
be moved, and target size. Realistically speaking, all 
of these need to be considered in the model, yet some 
are difficult to quantify. Obviously, the farther the 
distance to be moved, the longer a task should take. 
Furthermore, it is intuitively accepted that perform- 
ing a task which requires precision work should take 
longer than one not involving precision work: for ex- 
ample, threading a needle versus putting papers on a 
desk. 

Fitts [12] and Fitts and Peterson [13] investigated 
performance time with respect to two of the above 
factors, distance to be moved and target size. It was 
found that amplitude (A, distance to be moved) and 
target width (W) are related to time in a simple equa- 
tion: 

2 A 

Movement Time = a + 6 log — 


where a and 6 are constants. In this formulation, an 
index of movement difficulty is manipulated by the 
ratio of target width to amplitude and is given by: 


This index of difficulty shows the speed and accuracy 
tradeoff in movement. Since A is constant for any 
particular task, to decrease the performance time the 
only other variable in the equation W must be in- 
creased. That is, the faster a task is to be performed, 
the larger the target area and hence the movements 
are less accurate. 

lhis equation (known as Fit ts ? Law) can be embed- 
ded in the animation system, since for any given reach 
task, both A and W are known. The constants a and 
b are linked to the other factors such training, desire, 
fatigue, and body segments to be moved; they must 
be determined empirically. For button tapping tasks, 
Fitts [13] determined the mean time (MT) to be 


MT = 74 / D ~ 70msec 


Although Fitts’ Law has been found to be true for a 
variety of movements including arm movements (A = 
5 - 30cm) and wrist movements (A = 1.3cm) [9, 16, 
18], the application to 3D computer animation is only 
approximate. The constants differ for each limb and 
are only valid within a certain movement amplitude 
m 2D space , therefore the extrapolation of the data 
outside that range and into 3 dimensional space has 
no validated experimental basis. 

Nonetheless, Fitts’ Law provides a reasonable and 
easily computed basis for approximating movement 
durations. Should a more exact model be developed, 
it should readily fit into a 3D computer animation 
environment in which default task durations must be 
computed. 
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6 CONCLUSIONS AND 
FUTURE WORK 

One of the goals of the Computer Graphics Research 
Lab at the University of Pennsylvania is to develop 
human task performance analysis tools specifically for 
users who are engineers and not particularly likely 
to be animators. Higher-level animation tools are 
deemed essential to the satisfaction of this goal. We 
have demonstrated the feasibility of building a com- 
plete pipeline of processes beginning with natural lan- 
guage input, proceeding through semantic resolution 
of simple tasks, default task time durations, and ob- 
ject references, and ultimately terminating in inverse 
kinematic positioning and rendered graphics. The 
pipeline confronts the issues of establishing appro- 
priate linkages between objects, time, and actions at 
the language and geometric levels without adopting 
ad hoc solutions such as the selection of pre-defined 
key frames or the use of fixed default timings. 

Of course, the model is quite incomplete in many re- 
spects, but we have work in progress in many areas, 
including: 

• Extending the knowledge base to more com- 
plex task verbs and more general object envi- 
ronments. 

• Extending the animation interface to include dy- 
namics and constraints as well as inverse kine- 
matics. 

• Extending the task processor to a more general 
task simulator which handles temporal expres- 
sions, resource management, and task interrup- 
tion. 

• Extending the panel editor to permit on-line 
changes to panel object locations and semantics. 

Ultimately the user should be able to control most 
of aspects of the animation (excepting the creation 
of the actual geometric environment) through a 
language-based interface. This will include the ability 
for parameterizing (1) bodies, (2) object and object 
feature locations, and (3) tasks. With this capability, 
experiments can be performed without descending to 
the key frame level for animation. 
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